[ckpt] refactor: Consolidate fused expert mappings and fix MTP inference by yaoyu-33 · Pull Request #2685 · NVIDIA-NeMo/Megatron-Bridge

yaoyu-33 · 2026-03-06T16:32:16Z

Summary

Introduce FusedExpertMapping and FusedGatedExpertMapping in param_mapping.py to handle many-to-one / one-to-many expert weight conversions generically via is_grouped_export / group_key protocol
Eliminate duplicated maybe_modify_converted_hf_weight overrides and hf_weights_cache from GPT-OSS, GLM-4.5, GLM-4.5V, and Qwen3-VL bridges (net -195 lines)
Add _accumulate_grouped_export to MegatronModelBridge and _hf_import_cache for grouped import, centralizing the expert merge/split logic
Fix GLM-4.5 MTP mappings: replace stale transformer_layer with mtp_model_layer and propagate mtp_num_layers from HF config
Fix hf_to_megatron_generate_text.py: replace mtp_num_layers=None (crashes MTP-enabled models) with m.mtp_process=False

Test plan

Pre-commit hooks pass (ruff lint + format)
GPT-OSS e2e conversion + inference
GLM-4.5 e2e conversion + inference
GLM-4.5V e2e conversion + inference
Qwen3-VL MoE e2e conversion + inference
Qwen3.5-VL MoE e2e conversion + inference
Unit tests pass

Made with Cursor

Summary by CodeRabbit

New Features
- Introduced fused expert weight mappings for optimized Mixture of Experts model conversion
- Expanded support for new model layer configurations
Improvements
- Streamlined weight conversion workflows with enhanced expert weight alignment
- Improved automatic weight shape adjustment for expert model layers
- Simplified conversion logic across multiple supported architectures
- Better handling of complex weight transformations and per-expert distributions

Introduce FusedExpertMapping and FusedGatedExpertMapping in param_mapping.py to handle many-to-one / one-to-many expert weight conversions generically. This eliminates duplicated maybe_modify_converted_hf_weight overrides and hf_weights_cache from GPT-OSS, GLM-4.5, GLM-4.5V, and Qwen3-VL bridges (-502 / +307 lines). Also fixes two pre-existing bugs: - GLM-4.5 MTP mappings used stale 'transformer_layer' instead of 'mtp_model_layer', causing missing-mapping warnings - hf_to_megatron_generate_text.py set mtp_num_layers=None which crashed MTP-enabled models; replaced with m.mtp_process=False Signed-off-by: Yu Yao <yaoyu.094@gmail.com> Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Made-with: Cursor

copy-pr-bot · 2026-03-06T16:32:20Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

yaoyu-33 · 2026-03-06T16:32:24Z

/ok to test ff3705b

coderabbitai · 2026-03-06T16:54:48Z

📝 Walkthrough

Walkthrough

This PR introduces fused expert mapping classes and grouped export accumulation logic for optimized MoE weight conversion, replaces legacy per-expert mapping implementations across multiple model bridges (Qwen, GPT-OSS, GLM) with the new fused variants, and updates MTP inference handling by conditionally disabling the mtp_process attribute.

Changes

Cohort / File(s)	Summary
Core Fused Mapping Infrastructure `src/megatron/bridge/models/conversion/param_mapping.py`, `src/megatron/bridge/models/conversion/__init__.py`	Introduces FusedExpertMapping and FusedGatedExpertMapping classes for handling fused expert weights with grouped export semantics, adds _align_expert_weight_to_shape helper for shape alignment, implements _LooseGatedMLPMapping with grouped-export validation skipping, and exports new mappings via all.
Model Bridge Grouped Export `src/megatron/bridge/models/conversion/model_bridge.py`	Adds _accumulate_grouped_export method to handle per-expert weight accumulation and merging, integrates grouped export detection and routing into load_weights_hf_to_megatron and stream_weights flows, caches HF weights for reuse, and manages grouped_buffers for tensor stacking.
MTP Inference Example `examples/conversion/hf_to_megatron_generate_text.py`	Replaces unconditional mtp_num_layers nulling with conditional mtp_process disabling via getattr check.
GLM Bridge Refactoring `src/megatron/bridge/models/glm/glm45_bridge.py`, `src/megatron/bridge/models/glm/glm_moe_mappings.py`	Adds mtp_num_layers configuration support in GLM45Bridge, updates mtp_model_layer pattern mappings, removes legacy maybe_modify_converted_hf_weight method, consolidates GLM MoE mappings by aliasing GLMExpertDownProjMapping to FusedExpertMapping and removing internal implementations.
GLM-VL Bridge Simplification `src/megatron/bridge/models/glm_vl/glm_45v_bridge.py`	Removes maybe_modify_converted_hf_weight method and unused imports, eliminating per-expert weight assembly and merging logic.
GPT-OSS Bridge Modernization `src/megatron/bridge/models/gpt_oss/gpt_oss_bridge.py`	Removes weight caching and per-task conversion logic, adds is_grouped_export and group_key to GPTOSSMLPDownProjMapping and GPTOSSMLPGateUpProjMapping, simplifies maybe_modify_loaded_hf_weight to handle transposition and MXFP4 dequantization directly.
Qwen-VL Bridge Updates `src/megatron/bridge/models/qwen_vl/qwen3_vl_bridge.py`, `src/megatron/bridge/models/qwen_vl/qwen35_vl_bridge.py`	Replaces ExpertMLPGateUpProjMapping and ExpertMLPDownProjMapping with FusedGatedExpertMapping and FusedExpertMapping respectively, removes legacy init methods and internal weight alignment helpers, updates imports to use new fused variants.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant ModelBridge as MegatronModelBridge
    participant Accum as Grouped<br/>Accumulator
    participant GrpBuf as GroupedBuffers<br/>(Tensor Cache)
    participant Output as Merged<br/>Tensor Dict

    Client->>ModelBridge: stream_weights_hf_to_megatron<br/>(with grouped_export mapping)
    ModelBridge->>ModelBridge: Detect is_grouped_export=True
    ModelBridge->>GrpBuf: Initialize grouped_buffers[group_key]
    
    loop For Each Expert in Group
        ModelBridge->>ModelBridge: Load HF weight slice
        ModelBridge->>Accum: _accumulate_grouped_export<br/>(expert_idx, weight)
        Accum->>GrpBuf: Store per-expert weight<br/>at global expert index
    end
    
    Accum->>Accum: All experts collected?
    alt Yes - Group Complete
        Accum->>Accum: Stack/merge expert<br/>tensors into single tensor
        Accum->>Accum: Optionally transpose<br/>to match shape
        Accum->>Output: Return merged dict
        Output->>Client: Yield merged tensor
    else No - Still Accumulating
        Accum->>Client: Yield None (continue)
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

PR #2589: Modifies MoE expert parameter mappings in HF bridge mapping registries for models like GPT-OSS, directly related to expert mapping refactoring scope.
PR #2358: Introduces fused/grouped-expert mapping classes and switching bridges to use them, shares the core grouped-export mapping infrastructure introduced in this PR.
PR #2336: Modifies MTP disabling logic in the same inference example file, addresses related MTP handling changes.

Suggested labels

Run CICD

Suggested reviewers

cuichenx
ko3n1g
liding-nv

🚥 Pre-merge checks | ✅ 2 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 48.72% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Results For Major Changes	⚠️ Warning	PR contains substantial API changes and bug fixes but lacks documented test results; commit message explicitly states tests are pending rather than completed.	Execute complete test suite for all affected model variants and document results before merging; address unresolved review comments regarding re-exports and MTP configuration.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly summarizes the two main objectives: consolidating fused expert mappings and fixing MTP inference issues.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch yuya/refactor-fused-expert-mappings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/megatron/bridge/models/glm/glm45_bridge.py (1)
218-300: ⚠️ Potential issue | 🟠 Major

Add dual-prefix support for MTP layer mappings to handle both Megatron-Core naming conventions.

The MTP mappings currently hard-code only mtp_model_layer in the explicit QKV/MLP/expert mappings (lines 250, 256, 262, 267, 277, 284, 295, 300) and in the generated AutoMapping entries at line 218. Megatron-Core may expose the MTP submodule as transformer_layer instead, which will leave MTP weights unmapped for those checkpoints. Follow the pattern in mimo_bridge.py by iterating over both prefixes to ensure compatibility across different Megatron-Core versions.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/megatron/bridge/models/glm/glm45_bridge.py` around lines 218 - 300, The
MTP mappings only use the "mtp_model_layer" prefix causing missed mappings when
Megatron exposes the submodule as "transformer_layer"; update the mapping
construction to loop over both prefixes (e.g., prefixes = ["mtp_model_layer",
"transformer_layer"]) and add mappings for each prefix so every place that
currently constructs megatron_param with "mtp_model_layer" (including the
AutoMapping entries and the specialized mappings: QKVMapping, GatedMLPMapping,
GLMExpertGateUpProjMapping, GLMExpertDownProjMapping, and the existing
AutoMapping for experts) is duplicated/created for the alternate
"transformer_layer" prefix; follow the pattern used in mimo_bridge.py to
generate entries for both prefixes and append them to mapping_list.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/conversion/hf_to_megatron_generate_text.py`:
- Around line 171-172: The current change only flips the model instance flag
m.mtp_process, but you must also disable MTP at the config level and clear
mixed-precision scaling to avoid NCCL hangs: when you see the block that checks
hasattr(m, "mtp_process") and sets m.mtp_process = False, also set
m.config.mtp_num_layers = None (or 0 if config expects an int) and set
m.grad_scale_func = None, using attribute existence checks before assignment to
avoid attribute errors; update the same function/section that handles
m.mtp_process so all three changes are applied together.

In `@src/megatron/bridge/models/glm/glm_moe_mappings.py`:
- Around line 21-23: Module currently only re-exports GLMExpertDownProjMapping
causing import-time failure where GLMExpertGateUpProjMapping is expected; add a
matching re-export for the gate mapping by importing the appropriate symbol from
megatron.bridge.models.conversion.param_mapping and aliasing it to
GLMExpertGateUpProjMapping (mirror the existing pattern used for
GLMExpertDownProjMapping), so downstream code that imports and instantiates
GLMExpertGateUpProjMapping will succeed.

In `@src/megatron/bridge/models/gpt_oss/gpt_oss_bridge.py`:
- Around line 121-130: The quantized-path returning _dequantize_mxfp4(blocks,
scales) doesn't mirror the direct-tensor branch's transpose for 3D expert
weights, causing expert tensors to keep HF layout; update the branch handling
blocks_key/scales_key so that after calling _dequantize_mxfp4 you detect if
hf_param contains ".mlp.experts." and the returned tensor has ndim == 3, then
transpose the last two axes (i.e., swap -1 and -2) before returning; locate this
logic around the hf_param string branch that references hf_state_dict,
_dequantize_mxfp4, and the ".mlp.experts." selector to apply the fix.

---

Outside diff comments:
In `@src/megatron/bridge/models/glm/glm45_bridge.py`:
- Around line 218-300: The MTP mappings only use the "mtp_model_layer" prefix
causing missed mappings when Megatron exposes the submodule as
"transformer_layer"; update the mapping construction to loop over both prefixes
(e.g., prefixes = ["mtp_model_layer", "transformer_layer"]) and add mappings for
each prefix so every place that currently constructs megatron_param with
"mtp_model_layer" (including the AutoMapping entries and the specialized
mappings: QKVMapping, GatedMLPMapping, GLMExpertGateUpProjMapping,
GLMExpertDownProjMapping, and the existing AutoMapping for experts) is
duplicated/created for the alternate "transformer_layer" prefix; follow the
pattern used in mimo_bridge.py to generate entries for both prefixes and append
them to mapping_list.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 32a098f8-3a4b-4da8-b6f7-927e1570c4c4

📥 Commits

Reviewing files that changed from the base of the PR and between 3e9d2e3 and ff3705b.

📒 Files selected for processing (10)

examples/conversion/hf_to_megatron_generate_text.py
src/megatron/bridge/models/conversion/__init__.py
src/megatron/bridge/models/conversion/model_bridge.py
src/megatron/bridge/models/conversion/param_mapping.py
src/megatron/bridge/models/glm/glm45_bridge.py
src/megatron/bridge/models/glm/glm_moe_mappings.py
src/megatron/bridge/models/glm_vl/glm_45v_bridge.py
src/megatron/bridge/models/gpt_oss/gpt_oss_bridge.py
src/megatron/bridge/models/qwen_vl/qwen35_vl_bridge.py
src/megatron/bridge/models/qwen_vl/qwen3_vl_bridge.py

💤 Files with no reviewable changes (1)

src/megatron/bridge/models/glm_vl/glm_45v_bridge.py

yaoyu-33 · 2026-03-10T01:08:02Z

/ok to test ff3705b

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

yaoyu-33 · 2026-03-10T01:25:21Z

/ok to test 8d66144

…mappings The refactor in param_mapping.py renamed GLMExpertGateUpProjMapping to FusedGatedExpertMapping but only added GLMExpertDownProjMapping alias in glm_moe_mappings.py. Add the missing alias so existing bridge imports (glm45_bridge.py, glm_45v_bridge.py) continue to work. Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

yaoyu-33 · 2026-03-10T02:56:28Z

/ok to test 2dfbb99

Split multi-name import block into two separate import statements, each with per-line # noqa: F401 comments, to satisfy ruff's import block formatting requirements. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

yaoyu-33 · 2026-03-10T03:22:16Z

/ok to test 3cc9686

…ext tests - Set PROVIDER_CLASS = Qwen3NextModelProvider so super().provider_bridge() instantiates the correct provider (not GPTModelProvider which lacks MLA/hybrid fields like q_lora_rank) - Add value is not None guard in hf_config_to_provider_kwargs to skip None-valued config fields - Add null_attr fixture loop in test mocks to suppress Mock() objects for MLA/alternative-expert CONFIG_MAPPING fields Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

yaoyu-33 · 2026-03-10T04:17:58Z

/ok to test 29616bc

cuichenx

please verify roundtrip and inference outputs of these models

cuichenx · 2026-03-16T20:15:58Z

    for m in model:
-        m.config.mtp_num_layers = None
+        if hasattr(m, "mtp_process"):
+            m.mtp_process = False


make use of _disable_mtp function from hf_to_megatron_generate_vlm.py?

yes need to use
# Disable MTP for inference (MTP is only used during training)
def _disable_mtp(m):
"""Disable MTP on a model by clearing mtp_process on the language model."""
m.config.mtp_num_layers = None
inner = m.module if hasattr(m, "module") else m
lang = getattr(inner, "language_model", inner)
if hasattr(lang, "mtp_process"):
lang.mtp_process = False

yaoyu-33 · 2026-03-17T22:08:50Z

/ok to test a03ca50

…export With etp=1 and ep=1, TEGroupedLinear uses explicit_expert_comm=False, so expert weights are stored in [out, in] (PyTorch) rather than [in, out] (TE) layout. The unconditional transpose_on_export=True in GLMExpertDownProjMapping then incorrectly flips the stacked [8, 1024, 512] to [8, 512, 1024], causing torch.allclose to raise a shape mismatch in the GLM-4.5 TP=2 round-trip test. Fix: when hf_state_dict is available, only transpose if the stacked shape doesn't already match the original HF shape but the transposed shape does (same adaptive logic as the old maybe_modify_converted_hf_weight). Fall back to unconditional transpose when no HF reference is available. Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

yaoyu-33 · 2026-03-18T01:09:49Z

/ok to test c336f9c

copy-pr-bot Bot temporarily deployed to test March 6, 2026 16:33 Inactive

coderabbitai Bot reviewed Mar 6, 2026

View reviewed changes

Comment thread examples/conversion/hf_to_megatron_generate_text.py Outdated

Comment thread src/megatron/bridge/models/glm/glm_moe_mappings.py Outdated

Comment thread src/megatron/bridge/models/gpt_oss/gpt_oss_bridge.py Outdated

copy-pr-bot Bot temporarily deployed to public March 6, 2026 17:05 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci March 7, 2026 08:55 Inactive

copy-pr-bot Bot had a problem deploying to nemo-ci March 7, 2026 09:04 Failure

ci: re-trigger CI

8d66144

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

copy-pr-bot Bot temporarily deployed to test March 10, 2026 01:26 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci March 10, 2026 01:52 Inactive

copy-pr-bot Bot temporarily deployed to public March 10, 2026 02:05 Inactive

copy-pr-bot Bot had a problem deploying to nemo-ci March 10, 2026 02:12 Failure

copy-pr-bot Bot temporarily deployed to test March 10, 2026 03:22 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci March 10, 2026 03:36 Inactive

copy-pr-bot Bot had a problem deploying to nemo-ci March 10, 2026 03:43 Failure

copy-pr-bot Bot temporarily deployed to test March 10, 2026 04:18 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci March 10, 2026 04:19 Inactive

copy-pr-bot Bot had a problem deploying to nemo-ci March 10, 2026 04:29 Failure

yaoyu-33 marked this pull request as draft March 12, 2026 18:26

cuichenx reviewed Mar 16, 2026

View reviewed changes

Merge branch 'main' into yuya/refactor-fused-expert-mappings

a03ca50

copy-pr-bot Bot temporarily deployed to test March 17, 2026 22:09 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci March 17, 2026 22:22 Inactive

copy-pr-bot Bot had a problem deploying to nemo-ci March 17, 2026 22:31 Failure

copy-pr-bot Bot temporarily deployed to nemo-ci March 17, 2026 22:31 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci March 17, 2026 23:24 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci March 17, 2026 23:39 Inactive

copy-pr-bot Bot had a problem deploying to nemo-ci March 17, 2026 23:39 Failure

copy-pr-bot Bot temporarily deployed to nemo-ci March 17, 2026 23:39 Inactive

HollowMan6 mentioned this pull request Mar 19, 2026

Add GLM5 support #2469

Closed

5 tasks

Conversation

yaoyu-33 commented Mar 6, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

copy-pr-bot Bot commented Mar 6, 2026

Uh oh!

yaoyu-33 commented Mar 6, 2026

Uh oh!

coderabbitai Bot commented Mar 6, 2026

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yaoyu-33 commented Mar 10, 2026

Uh oh!

yaoyu-33 commented Mar 10, 2026

Uh oh!

yaoyu-33 commented Mar 10, 2026

Uh oh!

yaoyu-33 commented Mar 10, 2026

Uh oh!

yaoyu-33 commented Mar 10, 2026

Uh oh!

cuichenx left a comment

Choose a reason for hiding this comment

Uh oh!

cuichenx Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

yaoyu-33 Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

yaoyu-33 commented Mar 17, 2026

Uh oh!

yaoyu-33 commented Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yaoyu-33 commented Mar 6, 2026 •

edited by coderabbitai Bot

Loading